Architecture

Engine: Dremel

Combination of columnar data layouts and tree architecture

File System: Colossus

(Google distribuited filesystem) It uses columnar storage and compression systems.

The columnar format is called "capacitor"

Column vs Row format

"alt"

Row: columns are tied to each other and can't be separated. Column: columns are in separated files that can be in separate disks.

In Row format, even if we selected some columns, the full table scan will be done. In column format not. It can save time and can be parallelized.

Usual columnar formats vs BigQuery's capacitor

The difference lies in how compression is operated:

BigQuery can operate on compressed data without decompressing

General architecture

"alt"